This homework covers the multiple linear model and stepwise model selection lectures. All work must be submitted electronically. For full credit you must show all of your steps. Use of computational tools (e.g., R) is encouraged; and when you do, code inputs and outputs must be shown in-line (not as an appendix) and be accompanied by plain English that briefly explains what the code is doing.
This question involves data from a study on the nutrition of infants and preschool children in the north central region of the United States of America It is available on the course web page as nutrition.csv, and contains 72 observations of boys’ weight/height ratio (woh) for equally spaced values of age in months.
nutrition <- read.csv("~/Documents/Documents/Machine Learning/nutrition.csv")
fit <- lm(woh ~ age, data=nutrition)
p <- predict(fit, newdata = nutrition, interval = "prediction")
plot(nutrition$age, nutrition$woh, main="prediction interval", xlab = "age", ylab="woh")
x <- seq(0,max(nutrition$age),length=length(nutrition$age))
lines(x, p[,1])
lines(x, p[,2], col=2, lty=2)
lines(x, p[,3], col=2, lty=2)
The fit tells us that there is a 95% chance new data points will fall in between the prediction interval (represented by the red lines on the graph.) The r squared value is .8199, which means approximately 82% of the variability of the response data around the mean is explained by the model.
cat <-nutrition$age < 6.5
early <- as.factor(cat)
reg <- lm(nutrition$woh ~ nutrition$age + early + nutrition$age:early)
conf <- confint(reg, level = .95)
##intercept, slope, CI for first group:
c(reg$coefficients[1] + reg$coefficients[3], reg$coefficients[2] + reg$coefficients[4])
## (Intercept) nutrition$age
## 0.43304762 0.04342857
##int CI
c(conf[1]+conf[3], conf[1,2]+conf[3,2])
## [1] 0.3756907 0.4904045
##slope CI
c(conf[2]+conf[4], conf[2,2]+conf[4,2])
## [1] 0.03120052 0.05565662
##intercept, slope, CI for second group:
c(reg$coefficients[1], reg$coefficients[2])
## (Intercept) nutrition$age
## 0.719655995 0.004157843
##int CI
c(conf[1], conf[1,2])
## [1] 0.7056933 0.7336187
##slope CI
c(conf[2], conf[2,2])
## [1] 0.003836169 0.004479518
Yes, these are statistically different from 0 because all of the values are statistically significant based on the lm results.
age ranges (i.e., so that they do not overlap). Comment on the goodness of fit of this new model.plot(nutrition$age, nutrition$woh, main="prediction interval", xlab = "age", ylab="woh")
x <- seq(6.5,max(nutrition$age),length=length(nutrition$age))
i1 = reg$coefficients[1] + reg$coefficients[3]
s1 = reg$coefficients[2] + reg$coefficients[4]
i2 = reg$coefficients[1]
s2 = reg$coefficients[2]
ci1 = conf[1]+conf[3]
si1 = conf[2]+conf[4]
ci2 = conf[1,2]+conf[3,2]
si2 = conf[2,2]+conf[4,2]
segments(x0 = -1, y0=i1 + -1*s1, x1=6.5, y1=i1+6.5*s1)
segments(x0 = 6.5, y0=i2 + 6.5*s2, x1=80, y1=i2+80*s2)
segments(x0 = 6.5, y0=conf[1] + 6.5*conf[2], x1=80, y1=conf[1]+80*conf[2], col=2, lty=2)
segments(x0 = 6.5, y0=conf[1,2] + 6.5*conf[2,2], x1=80, y1=conf[1,2]+80*conf[2,2], col=2, lty=2)
segments(x0 = -1, y0=ci1 + -1*si1, x1=6.5, y1=ci1+6.5*si1, col=2, lty=2)
segments(x0 = -1, y0=ci2 + -1*si2, x1=6.5, y1=ci2+6.5*si2, col=2, lty=2)
fit = lm(nutrition$woh ~ nutrition$age)
anova(fit, reg)
## Analysis of Variance Table
##
## Model 1: nutrition$woh ~ nutrition$age
## Model 2: nutrition$woh ~ nutrition$age + early + nutrition$age:early
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 70 0.188623
## 2 68 0.042335 2 0.14629 117.49 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Perform the partial F test. This tells us whether the added predictors are worthwile. The F value in obtained from the anova test above is very statistically significant. The null hypothesis is rejected, meaning the added predictor is useful and the fit from part(b) is better.
In 1988, US cattle producers voted on whether or not to each pay a dollar per head towards the marketing campaigns of the American Beef Council. At the time of this vote, the council’s TV campaign featured a voice-over by actor Robert Mitchum, using the theme “Beef – it’s what’s for dinner.” To understand the vote results (it passed), the Montana state cattlemen’s association looked at the effect of the physical size of the farm and the value of the farms’ gross revenue on voter preference. The data (in beef.csv) consist of the vote results (%YES), average size of farm (hundreds of acres), and average value of products sold annually by each farm (in K$) for each of Montana’s 56 counties.
beef <- read.csv("~/Downloads/beef.csv")
plot(beef)
This plot shows that there is a pretty strong positive correlation between size and val.There is a moderate negative correlation between yes and size. And a weak negative correlation between yes and val. There is multicollinearity because the predictors (size and val) are correlated with themselves. They are actually more correlated with themselves, so it will be harder to make confident inference about these predictors. There is also some bunching (more data points for val at the lower x values), so we will probably want to take the logarithm.
YES with both SIZE and \(\log(\)VAL\()\) as covariates. Interpret the results. What regression assumptions might we have violated here?attach(beef)
reg = lm(YES ~ SIZE + log(VAL))
summary(reg)
##
## Call:
## lm(formula = YES ~ SIZE + log(VAL))
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.2200 -5.3463 0.5873 7.7057 28.0153
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 93.71445 9.95022 9.418 6.55e-13 ***
## SIZE -0.24275 0.08743 -2.777 0.00758 **
## log(VAL) -2.38614 2.79193 -0.855 0.39659
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.36 on 53 degrees of freedom
## Multiple R-squared: 0.2606, Adjusted R-squared: 0.2327
## F-statistic: 9.342 on 2 and 53 DF, p-value: 0.0003346
SIZE is a significant predictor, as summary shows the low p value. However, VAL may not be a good predictor since is has a high p value and is not very significant. This violates the assumption that the predictor variables are independent from one another leading to large standard errors.
SIZE change depending on \(\log(\)VAL\()\)? What is your estimate of the effect on YES of a unit change SIZE? Interpret your conclusion.attach(beef)
## The following objects are masked from beef (pos = 3):
##
## SIZE, VAL, YES
reg1 = lm(YES ~ SIZE*log(VAL))
summary(reg)
##
## Call:
## lm(formula = YES ~ SIZE + log(VAL))
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.2200 -5.3463 0.5873 7.7057 28.0153
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 93.71445 9.95022 9.418 6.55e-13 ***
## SIZE -0.24275 0.08743 -2.777 0.00758 **
## log(VAL) -2.38614 2.79193 -0.855 0.39659
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.36 on 53 degrees of freedom
## Multiple R-squared: 0.2606, Adjusted R-squared: 0.2327
## F-statistic: 9.342 on 2 and 53 DF, p-value: 0.0003346
summary(lm(YES ~ SIZE))
##
## Call:
## lm(formula = YES ~ SIZE)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.8015 -5.8995 0.5392 7.9553 28.4878
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 85.48991 2.52366 33.875 < 2e-16 ***
## SIZE -0.28940 0.06813 -4.248 8.57e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.33 on 54 degrees of freedom
## Multiple R-squared: 0.2505, Adjusted R-squared: 0.2366
## F-statistic: 18.04 on 1 and 54 DF, p-value: 8.572e-05
Yes, the effect of size changes depending on log(VAL). The summary shows that the SIZE: log(VAL) relationship is significant based on its small p value. Based on the second R output above, there is a negative correlation between size and yes. For every unit increase in size, there is a ~.289 decrease in yes.
Consider a situation where each \((x_i, y_i)\) pair is given a weight \(w_i > 0\), for \(i=1,\dots,n\) measuring its importance relative to the other pairs in the data. Otherwise the multiple linear regression setup is the same as always: \(x_i\) and \(\beta\) are \(p\)-vectors, the variance is constant, etc.
(5 pts) Ignoring for the moment the typical Gaussianity assumption on the noise, consider the following loss function similar as a weighted analog of the ordinary least squares. \[
\sum_{i=1}^n w_i (y_i - x_i^\top \beta)^2
\] Find an expression for the setting \(\hat{\beta}\) that minimizes that loss. Hint: You might find it helpful to vectorize the loss function above.
(4 pts) Now, suppose that your data was comprised of \(n\) unique \(p\)-variate \(x_i\) predictors, and at each one of those settings you had \(w_i \geq 1\) observations \(y_{i1}, \dots, y_{iw_i}\). These are \(w_i\) replicates observed at \(x_i\). Write down the full multiple linear regression model for such data, assuming \(\epsilon_{ij} \stackrel{\mathrm{iid}}{\sim} \mathcal{N}(0, \sigma^2)\), and derive an expression for the corresponding the log likelihood. Hint: consider using a double sum.
(5 pts) Show that the MLE \(\hat{\beta}\) from the likelihood in (b) may be found by applying your results from (a–b) using \(y_i \equiv \bar{y}_i = \frac{1}{w_i} \sum_{j=1}^{w_i} y_{ij}\). That is, although you have \(\sum_{i=1}^n w_i \equiv N\) data points, and potentially \(N \gg n\), once you have the \(\bar{y}_i\)’s your calculations need only involve working with \(n\) quantities, and you’ll get the same answer as if you had performed the calculations with all \(N\) \(y\)-values.
(6 pts) However, show that using the \(n\) averages \(\bar{y}_i\) is not sufficient for estimating \(\hat{\sigma}^2\). That is, you won’t get the same answer using just the \(\bar{y}_i\)’s as you would by working with all \(N\) quantities. What must you additionally know about each collection of replicates \(y_{i1}, \dots, y_{i w_i}\) so that your calculations involve working with \(2n\) quantities rather than \(N\)? Derive \(\hat{\sigma}^2\) using those \(2n\) quantities.
Consider the linear model \(Y = X\beta + \epsilon\), where \(X\) is a known \(n \times p\) matrix, \(\epsilon \sim N_n(0,\sigma^2I)\), and let \(x_i^\top\) denote the \(i\)th row of \(X\).
(7 pts) Let \(\hat{\beta} = (X^\top X)^{-1}X^\top Y\) denote the MLE in the model with all \(n\) observations, and let \(\hat{\beta}_{(-i)} = (X_{(-i)}^\top X_{(-i)})^{-1}X_{(-i)}^\top Y_{(-i)}\) denote the MLE in the model with the \(i^\mathrm{th}\) observation deleted. Show that \[
\hat{\beta} - \hat{\beta}_{(-i)} = \frac{1}{1-h_i}(X^\top X)^{-1}x_i e_i,
\] where \(h_i\) is the leverage of the \(i^\mathrm{th}\) data point and \(e_i\) is our usual residual. Hint: you might find the Sherman–Morrison–Woodbury formula useful.
(6 pts) Cook’s distance of the observation \((x_i,y_i)\) is defined as \[
D_i = \frac{1}{ps^2}(\hat{\beta}_{(-i)} -
\hat{\beta})^\top X^\top X(\hat{\beta}_{(-i)} - \hat{\beta}), \quad
i=1,\ldots,n,
\] where \(s^2 = \|Y - X\hat{\beta}\|^2/(n-p)\). It may appear that we need to fit \(n+1\) linear models in order to calculate all of the Cook’s distances. Deduce, however, that \[
D_i = \frac{1}{p}\Bigl(\frac{h_i}{1-h_i}\Bigr)\tilde{r}_i^2,
\] where \(\tilde{r}_i = \frac{e_i}{s\sqrt{1-h_i}}\) is (very similar to) the \(i^\mathrm{th}\) studentized fitted residual (given in class). Actually, this is called the internally studentized residual, whereas our version in class is the externally studentized one.
(2 pts) Briefly describe what Cook’s distance is measuring and why it might be helpful.
Cook’s distance gives you a sense of the influence the ith data point has. It can be used to indicate very influential points so they can be checked for validity.
Researchers at General Motors collected data on 60 U.S. Standard Metropolitan Statistical Areas (SMSA’s) in a study of whether or not air pollution contributes to mortality. The data are available as smsa.csv on the course web page. The dependent variable for analysis is age adjusted Mortality rate. Several (potential) explanatory variables include measures of demographic characteristics of the cities, of climate characteristics and the air pollution potential of three different chemicals. Specifically:
| Variable | Explanation |
|---|---|
JanT |
Mean January temperature (degrees Fahrenheit) |
JulyT |
Mean July temperature (degrees Fahrenheit) |
RelHum |
Relative Humidity |
Rain |
Annual rainfall (inches) |
Edu |
Median education |
PopD |
Population density |
NonWht |
Percentage of non whites |
WC |
Percentage of white collar workers |
Pop |
Population |
PHouse |
Population per household |
Income |
Median income |
HCPot |
HC pollution potential |
NOxPot |
Nitrous Oxide pollution potential |
SO2Pot |
Sulfur Dioxide pollution potential |
Mortality rate versus each of the predictor variables, above, and comment. Do the plots suggest any transformations that might be helpful?smsa <- read.csv("~/Documents/Documents/Machine Learning/smsa.csv")
attach(smsa)
par(mfrow=c(3,5), mar=c(5, 2, 1, 1))
plot(smsa[,1], smsa$Mortality)
for(i in c(1:4, 6:15)){
plot(smsa[,i], smsa$Mortality, xlab=names(smsa)[i], ylab = "Mortality")
}
Many of the plots are bunched up at the lower x values, suggesting that log transformations might be helpful. It looks like mortality decreases with education and income. And increases with temperature and NonWht.
YX <- data.frame(Mortality = smsa$Mortality, JanT=smsa$JanT, JulyT=smsa$JulyT, RelHum = smsa$RelHum, Rain = smsa$Rain, Edu=smsa$Edu, PopD=smsa$PopD, NonWht = smsa$NonWht, WC=smsa$WC, lpop=log(smsa$Pop), PHouse = smsa$PHouse, Income = smsa$Income, lHCPot = log(smsa$HCPot), lNoxPot = log(smsa$NOxPot), lS02Pot = smsa$S02Pot)
attach(YX)
## The following objects are masked from smsa:
##
## Edu, Income, JanT, JulyT, Mortality, NonWht, PHouse, PopD, Rain,
## RelHum, WC
fit1 <-lm(Mortality ~ ., data=YX)
summary(fit1)
##
## Call:
## lm(formula = Mortality ~ ., data = YX)
##
## Residuals:
## Min 1Q Median 3Q Max
## -65.517 -20.263 1.184 18.478 89.902
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.227e+03 2.624e+02 4.676 2.68e-05 ***
## JanT -1.815e+00 7.237e-01 -2.508 0.0158 *
## JulyT -1.664e+00 1.983e+00 -0.839 0.4058
## RelHum 5.567e-01 1.006e+00 0.553 0.5827
## Rain 1.310e+00 5.545e-01 2.362 0.0226 *
## Edu -7.667e+00 8.851e+00 -0.866 0.3910
## PopD 3.863e-03 4.175e-03 0.925 0.3597
## NonWht 4.992e+00 9.579e-01 5.212 4.53e-06 ***
## WC -1.752e+00 1.193e+00 -1.469 0.1488
## lpop 2.698e+00 7.595e+00 0.355 0.7241
## PHouse -3.519e+01 3.760e+01 -0.936 0.3543
## Income -5.276e-04 1.277e-03 -0.413 0.6814
## lHCPot -2.525e+01 1.459e+01 -1.731 0.0903 .
## lNoxPot 2.896e+01 1.371e+01 2.113 0.0402 *
## lS02Pot 1.007e-01 1.171e-01 0.860 0.3946
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 34.15 on 45 degrees of freedom
## Multiple R-squared: 0.7702, Adjusted R-squared: 0.6987
## F-statistic: 10.77 on 14 and 45 DF, p-value: 4.085e-10
The transformation reveals that many other columns are useful that didn’t originally appear to have a strong correlation such as JanT, Rain, and lNoxPot. The R value shows that ~76% of the variability is explained by the dependent variables, so it’s a pretty good fit. The variables with very small t statistics (like RelHum) are the least useful for explaining the trend in mortality.
fit2 = lm(Mortality ~ . -RelHum -Income -lS02Pot -lpop -JulyT -PopD -PHouse -Edu -lHCPot, data=YX)
summary(fit2)
##
## Call:
## lm(formula = Mortality ~ . - RelHum - Income - lS02Pot - lpop -
## JulyT - PopD - PHouse - Edu - lHCPot, data = YX)
##
## Residuals:
## Min 1Q Median 3Q Max
## -72.209 -22.440 -0.338 20.818 87.967
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 938.0565 47.7769 19.634 < 2e-16 ***
## JanT -2.0190 0.5131 -3.935 0.00024 ***
## Rain 2.0896 0.4567 4.575 2.82e-05 ***
## NonWht 4.1164 0.6306 6.527 2.41e-08 ***
## WC -2.2037 0.9275 -2.376 0.02109 *
## lNoxPot 19.1780 4.1985 4.568 2.90e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 34.29 on 54 degrees of freedom
## Multiple R-squared: 0.722, Adjusted R-squared: 0.6962
## F-statistic: 28.04 on 5 and 54 DF, p-value: 6.921e-14
Fit2 is a the model minus all of the variables that were not statistically significant in the first fit. We are left with only the factors that significantly affect the mortality rate. The R squared value shows that ~72% of the variation in the data are explained by these variables. This is marginally lower than the first fit that had more than double the explanatory variables which means that the ones taken away were unnessary. The amount of nitrous oxide is the best predictor of mortality.
step function, and comment on the relative models/fits. Compare the model from (c) with this new one by their approximate model probabilities.n <- length(Mortality)
fit3 = lm(Mortality ~ . -RelHum -Income -lS02Pot -lpop -JulyT -PopD -PHouse -Edu , data=YX)
fit4 = lm(Mortality ~ . -RelHum -Income -lS02Pot -lpop -JulyT -PopD -PHouse -Edu , data=YX)
fit5 = lm(Mortality ~ . -RelHum -Income -lS02Pot -lpop -JulyT -PopD -PHouse -Edu, data=YX)
fit6 = lm(Mortality ~ . -RelHum -Income -lS02Pot -lpop -JulyT -PopD -PHouse, data=YX)
fit7 = lm(Mortality ~ . -RelHum -Income -lS02Pot -lpop -JulyT -PopD, data=YX)
fit8 = lm(Mortality ~ . -RelHum -Income -lS02Pot -lpop -JulyT, data=YX)
fit9 = lm(Mortality ~ . -RelHum -Income -lS02Pot -lpop , data=YX)
fit10 = lm(Mortality ~ . -RelHum -Income -lS02Pot , data=YX)
fit11 = lm(Mortality ~ . -RelHum -Income , data=YX)
fit12 = lm(Mortality ~ . -RelHum, data=YX)
bic1 <- extractAIC(fit1, k=log(n))[2]
bic2 <- extractAIC(fit2, k=log(n))[2]
b3 <- extractAIC(fit3, k=log(n))[2]
b4 <- extractAIC(fit4, k=log(n))[2]
b5 <- extractAIC(fit5, k=log(n))[2]
b6 <- extractAIC(fit6, k=log(n))[2]
b7 <- extractAIC(fit7, k=log(n))[2]
b8 <- extractAIC(fit8, k=log(n))[2]
b9 <- extractAIC(fit9, k=log(n))[2]
b10 <- extractAIC(fit10, k=log(n))[2]
b11 <- extractAIC(fit11, k=log(n))[2]
b12 <- extractAIC(fit12, k=log(n))[2]
bics <- c(bic1, bic2, b3, b4, b5, b6, b7, b8, b9, b10, b11, b12)
probs <- exp(-0.5 * (bics - min(bics)))
probs <- probs/sum(probs)
max(probs)
## [1] 0.3048739
Bic shows that the second model from part c is the best because it has the highest probability.
step function to help select the final model. As above, comment on the quality of fit of your final model, interpret the results and compare the model probabilities.full <- lm(Mortality ~ .^2, data = YX)
start <- fit2
fwd <- step(start, scope=formula(full), direction = "forward", k=log(n))
## Start: AIC=442.44
## Mortality ~ (JanT + JulyT + RelHum + Rain + Edu + PopD + NonWht +
## WC + lpop + PHouse + Income + lHCPot + lNoxPot + lS02Pot) -
## RelHum - Income - lS02Pot - lpop - JulyT - PopD - PHouse -
## Edu - lHCPot
##
## Df Sum of Sq RSS AIC
## + Rain:lNoxPot 1 4300.1 59202 442.32
## <none> 63502 442.44
## + PopD 1 3501.7 60001 443.13
## + lHCPot 1 3336.6 60166 443.29
## + Edu 1 2977.7 60525 443.65
## + lS02Pot 1 2763.2 60739 443.86
## + JanT:lNoxPot 1 1129.0 62373 445.45
## + JanT:Rain 1 1000.3 62502 445.58
## + PHouse 1 837.2 62665 445.73
## + NonWht:lNoxPot 1 698.7 62804 445.87
## + WC:lNoxPot 1 679.3 62823 445.88
## + JanT:WC 1 627.5 62875 445.93
## + lpop 1 399.5 63103 446.15
## + Income 1 338.6 63164 446.21
## + NonWht:WC 1 285.3 63217 446.26
## + JanT:NonWht 1 261.3 63241 446.28
## + JulyT 1 202.2 63300 446.34
## + Rain:WC 1 179.4 63323 446.36
## + Rain:NonWht 1 95.0 63407 446.44
## + RelHum 1 0.4 63502 446.53
##
## Step: AIC=442.32
## Mortality ~ JanT + Rain + NonWht + WC + lNoxPot + Rain:lNoxPot
##
## Df Sum of Sq RSS AIC
## <none> 59202 442.32
## + PopD 1 3175.9 56026 443.11
## + lS02Pot 1 2355.8 56846 443.98
## + PHouse 1 2342.7 56859 443.99
## + Edu 1 1702.4 57500 444.67
## + Rain:NonWht 1 930.5 58272 445.47
## + lHCPot 1 912.9 58289 445.48
## + NonWht:lNoxPot 1 575.7 58626 445.83
## + NonWht:WC 1 483.9 58718 445.92
## + JanT:WC 1 392.8 58809 446.02
## + Income 1 384.5 58818 446.03
## + lpop 1 377.7 58824 446.03
## + JanT:Rain 1 356.5 58846 446.05
## + JanT:lNoxPot 1 230.0 58972 446.18
## + JanT:NonWht 1 212.1 58990 446.20
## + Rain:WC 1 141.2 59061 446.27
## + JulyT 1 70.2 59132 446.35
## + RelHum 1 26.1 59176 446.39
## + WC:lNoxPot 1 22.5 59180 446.39
back <- step(start, scope=formula(full), direction = "backward", k=log(n))
## Start: AIC=442.44
## Mortality ~ (JanT + JulyT + RelHum + Rain + Edu + PopD + NonWht +
## WC + lpop + PHouse + Income + lHCPot + lNoxPot + lS02Pot) -
## RelHum - Income - lS02Pot - lpop - JulyT - PopD - PHouse -
## Edu - lHCPot
##
## Df Sum of Sq RSS AIC
## <none> 63502 442.44
## - WC 1 6638 70140 444.31
## - JanT 1 18210 81712 453.47
## - lNoxPot 1 24536 88038 457.94
## - Rain 1 24619 88121 458.00
## - NonWht 1 50103 113606 473.24
b13 <- extractAIC(fwd, k=log(n))[2]
b14 <- extractAIC(back, k=log(n))[2]
bics <- c(bic1, bic2, b3, b4, b5, b6, b7, b8, b9, b10, b11, b12, b13, b14)
probs <- exp(-0.5 * (bics - min(bics)))
probs <- probs/sum(probs)
max(probs)
## [1] 0.1981938
##final model:
fwd
##
## Call:
## lm(formula = Mortality ~ JanT + Rain + NonWht + WC + lNoxPot +
## Rain:lNoxPot, data = YX)
##
## Coefficients:
## (Intercept) JanT Rain NonWht WC
## 985.9055 -1.1965 0.4767 3.2803 -2.4324
## lNoxPot Rain:lNoxPot
## -2.8723 0.6878
The model found by the step forward step function now has the highest probability, meaning it’s the best model. The new model is the same as the one by looking at t values in part c except it includes the Rain:lNoxPot interaction.